Goto

Collaborating Authors

 trajectory prediction task


Reviews: Social-BiGAT: Multimodal Trajectory Forecasting using Bicycle-GAN and Graph Attention Networks

Neural Information Processing Systems

Summary: The paper proposed to improve the interaction modeling between pedestrians by using a graph attention network [22] for the trajectory prediction task and learn multimodal trajectory distributions by using Bicycle-GAN [23]. The experimental results showed the effectiveness of the proposed approach by achieving state-of-the-art performance on the public benchmarks. Also, they showed that the performance of the proposed approach is more robust to varying K than that of the baselines, indicating that the proposed approach was successful in addressing the high variance issue in the existing approaches to a certain extent. Strengths: -- The paper is clearly written so it was easy to follow. The reasoning behind the choice of [22] and [23] for the trajectory prediction task is also clearly presented in the paper.


Missing Data: Datasets, Imputation, and Benchmarking

Neural Information Processing Systems

Datasets and code files are publicly accessible at Link. Our dataset will be hosted on both the GitHub and cloud storage drive. Code for the TimesNet Link Code for the SAITS Link 5.2 Trajectory Prediction Codes The following are the codes for the trajectory prediction methods used in our work. The dataset is primarily created by an academic team (students and faculty). The data statistics are shown in Section 4 of the main paper.



Application of Vision-Language Model to Pedestrians Behavior and Scene Understanding in Autonomous Driving

Gao, Haoxiang, Zhao, Yu

arXiv.org Artificial Intelligence

Autonomous driving (AD) has experienced significant improvements in recent years and achieved promising 3D detection, classification, and localization results. However, many challenges remain, e.g. semantic understanding of pedestrians' behaviors, and downstream handling for pedestrian interactions. Recent studies in applications of Large Language Models (LLM) and Vision-Language Models (VLM) have achieved promising results in scene understanding and high-level maneuver planning in diverse traffic scenarios. However, deploying the billion-parameter LLMs to vehicles requires significant computation and memory resources. In this paper, we analyzed effective knowledge distillation of semantic labels to smaller Vision networks, which can be used for the semantic representation of complex scenes for downstream decision-making for planning and control.


Traj-LLM: A New Exploration for Empowering Trajectory Prediction with Pre-trained Large Language Models

Lan, Zhengxing, Li, Hongbo, Liu, Lingshan, Fan, Bo, Lv, Yisheng, Ren, Yilong, Cui, Zhiyong

arXiv.org Artificial Intelligence

Predicting the future trajectories of dynamic traffic actors is a cornerstone task in autonomous driving. Though existing notable efforts have resulted in impressive performance improvements, a gap persists in scene cognitive and understanding of the complex traffic semantics. This paper proposes Traj-LLM, the first to investigate the potential of using Large Language Models (LLMs) without explicit prompt engineering to generate future motion from agents' past/observed trajectories and scene semantics. Traj-LLM starts with sparse context joint coding to dissect the agent and scene features into a form that LLMs understand. On this basis, we innovatively explore LLMs' powerful comprehension abilities to capture a spectrum of high-level scene knowledge and interactive information. Emulating the human-like lane focus cognitive function and enhancing Traj-LLM's scene comprehension, we introduce lane-aware probabilistic learning powered by the pioneering Mamba module. Finally, a multi-modal Laplace decoder is designed to achieve scene-compliant multi-modal predictions. Extensive experiments manifest that Traj-LLM, fortified by LLMs' strong prior knowledge and understanding prowess, together with lane-aware probability learning, outstrips state-of-the-art methods across evaluation metrics. Moreover, the few-shot analysis further substantiates Traj-LLM's performance, wherein with just 50% of the dataset, it outperforms the majority of benchmarks relying on complete data utilization. This study explores equipping the trajectory prediction task with advanced capabilities inherent in LLMs, furnishing a more universal and adaptable solution for forecasting agent motion in a new way.


SingularTrajectory: Universal Trajectory Predictor Using Diffusion Model

Bae, Inhwan, Park, Young-Jae, Jeon, Hae-Gon

arXiv.org Artificial Intelligence

There are five types of trajectory prediction tasks: deterministic, stochastic, domain adaptation, momentary observation, and few-shot. These associated tasks are defined by various factors, such as the length of input paths, data split and pre-processing methods. Interestingly, even though they commonly take sequential coordinates of observations as input and infer future paths in the same coordinates as output, designing specialized architectures for each task is still necessary. For the other task, generality issues can lead to sub-optimal performances. In this paper, we propose SingularTrajectory, a diffusion-based universal trajectory prediction framework to reduce the performance gap across the five tasks. The core of SingularTrajectory is to unify a variety of human dynamics representations on the associated tasks. To do this, we first build a Singular space to project all types of motion patterns from each task into one embedding space. We next propose an adaptive anchor working in the Singular space. Unlike traditional fixed anchor methods that sometimes yield unacceptable paths, our adaptive anchor enables correct anchors, which are put into a wrong location, based on a traversability map. Finally, we adopt a diffusion-based predictor to further enhance the prototype paths using a cascaded denoising process. Our unified framework ensures the generality across various benchmark settings such as input modality, and trajectory lengths. Extensive experiments on five public benchmarks demonstrate that SingularTrajectory substantially outperforms existing models, highlighting its effectiveness in estimating general dynamics of human movements. Code is publicly available at https://github.com/inhwanbae/SingularTrajectory .


Group Activity Recognition in Basketball Tracking Data -- Neural Embeddings in Team Sports (NETS)

Hauri, Sandro, Vucetic, Slobodan

arXiv.org Artificial Intelligence

Like many team sports, basketball involves two groups of players who engage in collaborative and adversarial activities to win a game. Players and teams are executing various complex strategies to gain an advantage over their opponents. Defining, identifying, and analyzing different types of activities is an important task in sports analytics, as it can lead to better strategies and decisions by the players and coaching staff. The objective of this paper is to automatically recognize basketball group activities from tracking data representing locations of players and the ball during a game. We propose a novel deep learning approach for group activity recognition (GAR) in team sports called NETS. To efficiently model the player relations in team sports, we combined a Transformer-based architecture with LSTM embedding, and a team-wise pooling layer to recognize the group activity. Training such a neural network generally requires a large amount of annotated data, which incurs high labeling cost. To address scarcity of manual labels, we generate weak-labels and pretrain the neural network on a self-supervised trajectory prediction task. We used a large tracking data set from 632 NBA games to evaluate our approach. The results show that NETS is capable of learning group activities with high accuracy, and that self- and weak-supervised training in NETS have a positive impact on GAR accuracy.

  handoff, play sequence, trajectory prediction task, (11 more...)
2209.00451